home *** CD-ROM | disk | FTP | other *** search
- Path: bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv
- From: andrewh@speech.su.oz.au (Andrew Hunt)
- Newsgroups: comp.speech,comp.answers,news.answers
- Subject: comp.speech Frequently Asked Questions - part 1/3
- Supersedes: <comp-speech-faq/part1_764040899@rtfm.mit.edu>
- Followup-To: comp.speech
- Date: 16 Apr 1994 13:07:57 GMT
- Organization: Speech Technology Group, The University of Sydney
- Lines: 814
- Approved: news-answers-request@MIT.Edu
- Expires: 28 May 1994 13:05:48 GMT
- Message-ID: <comp-speech-faq/part1_766501548@rtfm.mit.edu>
- Reply-To: andrewh@speech.su.oz.au (Andrew Hunt)
- NNTP-Posting-Host: bloom-picayune.mit.edu
- Summary: Useful information about Speech Technology
- X-Last-Updated: 1994/04/06
- Originator: faqserv@bloom-picayune.MIT.EDU
- Xref: bloom-beacon.mit.edu comp.speech:2283 comp.answers:4932 news.answers:18146
-
- Archive-name: comp-speech-faq/part1
- Last-modified: 1994/04/06
-
-
- comp.speech
-
- Frequently Asked Questions
- ==========================
-
- This document is an attempt to answer commonly asked questions and to
- reduce the bandwidth taken up by these posts and their associated replies.
- If you have a question, please check this file before you post.
-
- The FAQ is not meant to discuss any topic exhaustively. It will hopefully
- provide readers with pointers on where to find useful information. It also
- tries to list useful material available elsewhere on the net.
-
- If you have not already read the Usenet introductory material posted to
- "news.announce.newusers", please do. For help with FTP (file transfer
- protocol) look for a regular posting of "Anonymous FTP List - FAQ" in
- comp.misc, comp.archives.admin or news.answers.
-
-
- This FAQ is posted every 4 weeks to comp.speech, comp.answers & news.answers.
-
-
- It is also available for anonymous ftp from the comp.speech archive site
- svr-ftp.eng.cam.ac.uk:/comp.speech/FAQ
- It is also available from the news.answers ftp site (and its mirrors) as
- rtfm.mit.edu:/pub/usenet/news.answers/comp-speech-faq
- It is also available by sending email to <mail-server@rtfm.mit.edu> with
- send usenet/news.answers/comp-speech-faq/*
- in one line of the body of the message.
-
-
- Admin
- -----
-
- This release brings updates on a number of synthesis and recognition
- products as well as a number of new entries. Keeping up-to-date with
- the increasing number of new Windows products is becoming more
- difficult. Any help with this will be greatly appreciated.
-
-
- Cheers,
-
- Andrew Hunt
- Speech Technology Research Group email: andrewh@speech.su.oz.au
- Department of Electrical Engineering Ph: 61-2-692 4509
- University of Sydney, NSW, Australia. Fax: 61-2-692 3847
-
-
- ========================== Acknowledgements ===========================
-
- Thanks to the following for their significant comments and contributions.
-
- Barry Arons <barons@media-lab.mit.edu>
- Joe Campbell <jpcampb@afterlife.ncsc.mil>
- Oliver Jakobs <jakobs@ldv01.Uni-Trier.de>
- Sonja Kowalewski <kowa@uniko.uni-koblenz.de>
- Tony Robinson <ajr@eng.cam.ac.uk>
- Mike <mike%jim.uucp@wupost.wustl.edu>
-
- Many others have provided useful information. Thanks to all.
-
-
- ============================ Contents =================================
-
- SECTION 1 - General
-
- Q1.1: What is comp.speech?
- Q1.2: Where are the comp.speech archives?
- Q1.3: Common abbreviations and jargon.
- Q1.4: What are related newsgroups and mailing lists?
- Q1.5: What are related journals and conferences?
- Q1.6: What resources are available as handicap aids?
- Q1.7: What speech data is available?
- Q1.8: Speech File Formats, Conversion and Playing.
- Q1.9: What "Speech Laboratory Environments" are available?
- Q1.10: Miscelaneous Software and Other Resources.
-
- SECTION 2 - Signal Processing for Speech
-
- Q2.1: What sampling do I need for speech?
- Q2.2: How do I find the pitch of a speech signal?
- Q2.3: How do I find the start and end points of a speech signal?
- Q2.4: Where can I find FFT software?
- Q2.5: What signal processing techniques are used in speech technology?
- Q2.6: What speech sampling and signal processing hardware can I use?
- Q2.7: How do I convert to/from mu-law format?
-
- SECTION 3 - Speech Coding and Compression
-
- Q3.1: Speech compression techniques.
- Q3.2: What are some good references/books on coding/compression?
- Q3.3: What software is available?
-
- SECTION 4 - Natural Language Processing
-
- Q4.1: What are some good references/books on NLP?
- Q4.2: What NLP software is available?
-
- SECTION 5 - Speech Synthesis
-
- Q5.1: What is speech synthesis?
- Q5.2: How can speech synthesis be performed?
- Q5.3: What are some good references/books on synthesis?
- Q5.4: What software/hardware is available?
-
- SECTION 6 - Speech Recognition
-
- Q6.1: What is speech recognition?
- Q6.2: How can I build a very simple speech recogniser?
- Q6.2: What does speaker dependent/adaptive/independent mean?
- Q6.3: What does small/medium/large/very-large vocabulary mean?
- Q6.4: What does continuous speech or isolated-word mean?
- Q6.5: How is speech recognition done?
- Q6.6: What are some good references/books on recognition?
- Q6.7: What speech recognition packages are available?
-
- =======================================================================
-
- SECTION 1 - General
-
- Q1.1: What is comp.speech?
-
- comp.speech is a newsgroup for discussion of speech technology and
- speech science. It covers a wide range of issues from application of
- speech technology, to research, to products and lots more. By nature
- speech technology is an inter-disciplinary field and the newsgroup reflects
- this. However, computer application is the basic theme of the group.
-
- The following is a list of topics but does not cover all matters related
- to the field - no order of importance is implied.
-
- [1] Speech Recognition - discussion of methodologies, training, techniques,
- results and applications. This should cover the application of techniques
- including HMMs, neural-nets and so on to the field.
-
- [2] Speech Synthesis - discussion concerning theoretical and practical
- issues associated with the design of speech synthesis systems.
-
- [3] Speech Coding and Compression - both research and application matters.
-
- [4] Phonetic/Linguistic Issues - coverage of linguistic and phonetic issues
- which are relevant to speech technology applications. Could cover parsing,
- natural language processing, phonology and prosodic work.
-
- [5] Speech System Design - issues relating to the application of speech
- technology to real-world problems. Includes the design of user interfaces,
- the building of real-time systems and so on.
-
- [6] Other matters - relevant conferences, books, public domain software,
- hardware and related products.
-
- ------------------------------------------------------------------------
-
- Q1.2: Where are the comp.speech archives?
-
- comp.speech is being archived for anonymous ftp.
-
- ftp site: svr-ftp.eng.cam.ac.uk (or 129.169.24.20).
- directory: comp.speech/archive
-
- comp.speech/archive contains the articles as they arrive. Batches of 100
- articles are grouped into a shar file, along with an associated file of
- Subject lines.
-
- Other useful information is also available in comp.speech/info.
-
- ------------------------------------------------------------------------
-
- Q1.3: Common abbreviations and jargon.
-
- ANN - Artificial Neural Network.
- ASR - Automatic Speech Recognition.
- ASSP - Acoustics Speech and Signal Processing
- AVIOS - American Voice I/O Society
- CELP - Code-book excited linear prediction.
- COLING - Computational Linguistics
- DTW - Dynamic time warping.
- FAQ - Frequently asked questions.
- HMM - Hidden markov model.
- IEEE - Institute of Electrical and Electronics Engineers
- JASA - Journal of the Acoustic Society of America
- LPC - Linear predictive coding.
- LVQ - Learned vector quantisation.
- NLP - Natural Language Processing.
- NN - Neural Network.
- TI - Texas Instruments.
- TIMIT - A big speech database from TI and MIT - see Q1.6
- TTS - Text-To-Speech (i.e. synthesis).
- VQ - Vector Quantisation.
-
- ------------------------------------------------------------------------
-
- Q1.4: What are related newsgroups and mailing lists?
-
-
- NEWGROUPS
-
- comp.ai - Artificial Intelligence newsgroup.
- Postings on general AI issues, language processing and AI techniques.
- Has a good FAQ including NLP, NN and other AI information.
-
- comp.ai.nat-lang - Natural Language Processing Group
- Postings regarding Natural Language Processing. Set up to cover
- a broard range of related issues and different viewpoints.
-
- comp.ai.nlang-know-rep - Natural Language Knowledge Representation
- Moderated group covering Natural Language.
-
- comp.ai.neural-nets - discussion of Neural Networks and related issues.
- There are often posting on speech related matters - phonetic recognition,
- connectionist grammars and so on.
-
- comp.compression - occasional articles on compression of speech.
- FAQ for comp.compression has some info on audio compression standards.
-
- comp.dcom.telecom - Telecommunications newsgroup.
- Has occasional articles on voice products.
-
- comp.dsp - discussion of signal processing - hardware and algorithms and more.
- Has a good FAQ posting.
- Has a regular posting of a comprehensive list of Audio File Formats.
-
- comp.multimedia - Multi-Media discussion group.
- Has occasional articles on voice I/O.
-
- sci.lang - Language.
- Discussion about phonetics, phonology, grammar, etymology and lots more.
-
- alt.sci.physics.acoustics - some discussion of speech production & perception.
-
- alt.binaries.sounds.misc - posting of various sound samples
- alt.binaries.sounds.d - discussion about sound samples, recording and playback.
-
-
- MAILING LISTS
-
- ECTL - Electronic Communal Temporal Lobe
- Founder & Moderator: David Leip
- Moderated mailing list for researchers with interests in computer speech
- interfaces. This list serves a broad community including persons from
- signal processing, AI, linguistics and human factors.
-
- To subscribe, send the following information to:
- ectl-request@snowhite.cis.uoguelph.ca
- name, institute, department, daytime phone & e-mail address
-
- To access the archive, ftp snowhite.cis.uoguelph.ca, login as anonymous,
- and supply your local userid as a password. All the ECTL things can be
- found in pub/ectl.
-
- Prosody Mailing List
- Unmoderated mailing list for discussion of prosody. The aim is
- to facilitate the spread of information relating to the research
- of prosody by creating a network of researchers in the field.
- If you want to participate, send the following one-line
- message to "listserv@msu.edu" :-
-
- subscribe prosody Your Name
-
- foNETiks
- A moderated monthly newsletter distributed by e-mail. It carries
- job advertisements, notices of conferences, and other news of
- general interest to phoneticians, speech scientists and others
- The editors are Linda Shockey and Gerry Docherty. To subscribe
- send the following 1 line message to 'mailbase@mailbase.ac.uk'
-
- join fonetiks your_first_name your_second_name
-
- Digital Mobile Radio
- Covers lots of areas include some speech topics including speech
- coding and speech compression.
- Mail Peter Decker (dec@dfv.rwth-aachen.de) to subscribe.
-
- ------------------------------------------------------------------------
-
- Q1.5: What are related journals and conferences?
-
- Try the following commercially oriented magazine:-
-
- Speech Technology - no longer published
- Voice Technology News
-
- Try the following technical journals (some contact addresses below):-
-
- IEEE Transactions on Speech and Audio Processing (from Jan 93)
- IEEE Transactions on Acoustics, Speech, and Signal Processing
- (ASSP) - now obsolete.
- Computational Linguistics (COLING)
- Computer Speech and Language
- Journal of the Acoustical Society of America (JASA)
- Transactions of IEEE ASSP
- AVIOS Journal
- ASR News
-
- Try the following conferences:-
-
- ICASSP Intl. Conference on Acoustics Speech and Signal Processing (IEEE)
- ICSLP Intl. Conference on Spoken Language Processing
- EUROSPEECH European Conference on Speech Communication and Technology
- AVIOS American Voice I/O Society Conference
- SST Australian Speech Science and Technology Conference
- SpeechTech
-
-
- Here are a few contact addresses:-
-
- Publications: IEEE Transactions on Speech and Audio Processing (from Jan 93)
- IEEE Transactions on Acoustics, Speech, and Signal Processing
- (ASSP) - now obsolete.
- Organization: Institute of Electrical and Electronics Engineers (IEEE)
- Address: IEEE Service Center
- 445 Hoes Lane
- PO Box 1331
- Piscataway, NJ 08855, USA
- Phone number: 1-800-678-IEEE
- (201)981-0060
-
- Publications: Computer Speech and Language
- Organization: Academic Press, Ltd.
- Address: 24-28 Oval Rd
- London NW1
- England
- Price: $136 (Institutions), $58 (Individuals)
-
- Publications: Association for Computational Linguistics
- Organization: Association for Computational Linguistics
- Address: MIT Press Journals
- 55 Hayward St
- Cambridge, MA 02142
- Phone number: (617)253-2889
-
-
- ------------------------------------------------------------------------
-
- Q1.6: What resources are available as handicap aids?
-
- Can anyone provide information on speech technology aids for the deaf,
- blind, speech impaired, physically impaired and other groups who may
- benefit from speech technology?
-
-
- Product Name: SpeechViewer II
- Platform: IBM Machines from Mod 25 on.
- Description: SpeechViewer II is a speech therapy tool. It provided
- graphical feedback of various speech features so that speech
- impaired individuals can improve their speech. It works with an
- audio bandwidth of 7.3 Khz and thus allows the therapist to work
- with sustained vowels and fricatives. A wide range of graphics
- are used to provide adequate variability to hold client interest.
- An extensive set of statistics are gathered which allows a therapist
- to do research or keep therapy records.
- The speech therapy modules are:
- o Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing
- o Skill Building - Pitch, Voicing, Phonology
- o Patterning - Pitch & Loudness - Waveform & Spectrogram, Spectra
- o Clinical Management - Profiles, Models, Client Data
- Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture Playback
- Adapter). It has a TI TMS320C25 DSP chip. The input sampling
- rate is 44.1 Khz stereo, 88.2 Khz mono. This is a 16 bit card.
- It has the following jacks: mic in, stereo line in, stereo line
- out, speaker out. Note: This card is being replaced by Mwave
- technology. For more info on Mwave contact Texas Instruments.
- Price: The software is $2130 list, $1491 educational, part number 92F2066.
- The M-ACPA is $370 list, $222 educational, part number 92F3378.
- The MicroChannel adapter part number is 92F3379 (same price).
- Contact: The Psychological Corporation (TPC) [IBM Authorized Remarketer]
- Phone: 1-800-228-0752
- Or contact IBM on 1-800-426-4832.
-
- ------------------------------------------------------------------------
-
- Q1.7: What speech data is available?
-
- A wide range of speech databases have been collected. These databases
- are primarily for the development of speech synthesis/recognition and for
- linguistic research.
-
- Some databases are free but most appear to be available for a small cost.
- The databases normally require lots of storage space - do not expect to be
- able to ftp all the data you want.
-
- [There are too many to list here in detail - perhaps someone would like to
- set up a special posting on speech databases?]
-
-
- PHONEMIC SAMPLES
- ================
-
- First, some basic data. The following sites have samples of English phonemes
- (American accent I believe) in Sun audio format files. See Question 1.7
- for information on audio file formats.
-
- sounds.sdsu.edu:/.1/phonemes
- phloem.uoregon.edu:/pub/Sun4/lib/phonemes
- sunsite.unc.edu:/pub/multimedia/sun-sounds/phonemes
-
-
- HOMOPHONE LIST
- ==============
-
- A list of homophones in General American English is available by anonymous
- FTP from the comp.speech archive site:
-
- machine name: svr-ftp.eng.cam.ac.uk
- directory: comp.speech/data
- file name: homophones-1.01.txt
-
-
- LINGUISTIC DATA CONSORTIUM (LDC)
- ================================
-
- Information about the Linguistic Data Consortium is available via
- anonymous ftp from: ftp.cis.upenn.edu (130.91.6.8)
- in the directory: /pub/ldc
-
- Here are some excerpts from the files in that directory:
-
- Briefly stated, the LDC has been established to broaden the collection
- and distribution of speech and natural language data bases for the
- purposes of research and technology development in automatic speech
- recognition, natural language processing and other areas where large
- amounts of linguistic data are needed.
-
- Here is the brief list of corpora:
-
- * The TIMIT and NTIMIT speech corpora
- * The Resource Management speech corpus (RM1, RM2)
- * The Air Travel Information System (ATIS0) speech corpus
- * The Association for Computational Linguistics - Data Collection
- Initiative text corpus (ACL-DCI)
- * The TI Connected Digits speech corpus (TIDIGITS)
- * The TI 46-word Isolated Word speech corpus (TI-46)
- * The Road Rally conversational speech corpora (including "Stonehenge"
- and "Waterloo" corpora)
- * The Tipster Information Retrieval Test Collection
- * The Switchboard speech corpus ("Credit Card" excerpts and portions
- of the complete Switchboard collection)
-
- Further resources to be made available within the first year (or two):
-
- * The Machine-Readable Spoken English speech corpus (MARSEC)
- * The Edinburgh Map Task speech corpus
- * The Message Understanding Conference (MUC) text corpus of FBI
- terrorist reports
- * The Continuous Speech Recognition - Wall Street Journal speech
- corpus (WSJ-CSR)
- * The Penn Treebank parsed/tagged text corpus
- * The Multi-site ATIS speech corpus (ATIS2)
- * The Air Traffic Control (ATC) speech corpus
- * The Hansard English/French parallel text corpus
- * The European Corpus Initiative multi-language text corpus (ECI)
- * The Int'l Labor Organization/Int'l Trade Union multi-language
- text corpus (ILO/ITU)
- * Machine-readable dictionaries/lexical data bases (COMLEX, CELEX)
-
- The files in the directory include more detailed information on the
- individual databases. For further information contact
-
- Linguistic Data Consortium
- 441 Williams Hall
- University of Pennsylvania
- Philadelphia, PA 19104-6305
- Phone: +1 (215) 898-0464
- Fax: +1 (215) 573-2175
- e-mail: ldc@unagi.cis.upenn.edu
-
-
- Center for Spoken Language Understanding (CSLU)
- ===============================================
-
- 1. The ISOLET speech database of spoken letters of the English alphabet.
- The speech is high quality (16 kHz with a noise cancelling microphone).
- 150 speakers x 26 letters of the English alphabet twice in random order.
- The "ISOLET" data base can be purchased for $100 by sending an email request
- to vincew@cse.ogi.edu. (This covers handling, shipping and medium costs).
- The data base comes with a technical report describing the data.
-
- 2. CSLU has a telephone speech corpus of 1000 English alphabets. Callers
- recite the alphabet with brief pauses between letters. This database is
- available to not-for-profit institutions for $100. The data base is described
- in the proceedings of the International Conference on Spoken Language
- Processing. Contact vincew@cse.ogi.edu if interested.
-
-
- PhonDat - A Large Database of Spoken German
- ===========================================
-
- The PhonDat continuous speech corpora are now available on
- CD-ROM media (ISO 9660 format).
-
- PhonDat I (Diphone Corpus) : 6 CDs (1140.- DM)
- PhonDat II (Train Enquiries Corpus): 1 CD ( 190.- DM)
-
- PhonDat I comprises approx. 20.000, PhonDat II approx. 1500
- signal files in high quality 16-bit 16 KHz recording. The
- corpora come with a documentation containing the orthographic
- transcription and a citation form of the utterances, as well as a
- detailed file format description. A narrow phonetic transcription
- is available for selected files from corpus I and II.
-
- For information and orders contact
-
- Barbara Eisen
- Institut fuer Phonetik
- Schellingstr. 3 / II
- D 80799 Munich 40
-
- Tel: +49 / 89 / 2180 -2454 or -2758
- Fax: +49 / 89 / 280 03 62
-
-
- Oxford Acoustic Phonetic Database
- =================================
-
- Available on compact Disc, from J.B. Pickering and B.S. Rosner.
- It contains data on vowel-consonant and consonant-vowel combinations
- in both stressed and unstressed locations. The language covered
- include French, German, Hungarian, Italian, Japanese, British English,
- Spanish and English.
-
- Does anyone know a contact email or snail mail address?
-
- ------------------------------------------------------------------------
-
- Q1.8: Speech File Formats, Conversion and Playing.
-
- Section 2 of this FAQ has information on mu-law coding.
-
- A very good and very comprehensive list of audio file formats is prepared
- by Guido van Rossum. The list is posted regularly to comp.dsp and
- alt.binaries.sounds.misc, amongst others. It includes information on
- sampling rates, hardware, compression techniques, file format definitions,
- format conversion, standards, programming hints and lots more. It is much
- too long to include within this posting.
-
- It is also available by ftp
- from: ftp.cwi.nl
- directory: /pub
- file: AudioFormats<version>
-
- ------------------------------------------------------------------------
-
- Q1.9: What "Speech Laboratory Environments" are available?
-
- First, what is a Speech Laboratory Environment? A speech lab is a
- software package which provides the capability of recording, playing,
- analysing, processing, displaying and storing speech. Your computer
- will require audio input/output capability. The different packages
- vary greatly in features and capability - best to know what you want
- before you start looking around.
-
- Most general purpose audio processing packages will be able to process speech
- but do not necessarily have some specialised capabilities for speech (e.g.
- formant analysis).
-
- The following article provides a good survey.
-
- Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An Evaluation"
- Journal of Speech and Hearing Research, pp 314-332, April 1992.
-
-
- Package: Entropic Signal Processing System (ESPS) and Waves
- Platform: Range of Unix platforms.
- Description: ESPS is a very comprehensive set of speech analysis/processing
- tools for the UNIX environment. The package includes UNIX commands,
- and a comprehensive C library (which can be accessed from other
- languages). Waves is a graphical front-end for speech processing.
- Speech waveforms, spectrograms, pitch traces etc can be displayed,
- edited and processed in X windows and Openwindows (versions 2 & 3).
- The HTK (Hidden Markov Model Toolkit) is now available from Entropic.
- HTK is described in some detail in Section 5 of this FAQ - the
- section on Speech Recognition.
- Cost: On request.
- Contact: Entropic Research Laboratory, Washington Research Laboratory,
- 600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003
- (202) 547-1420. email - info@wrl.epi.com
-
-
- Package: CSRE: Canadian Speech Research Environment
- Platform: IBM/AT-compatibles
- Description: CSRE is a comprehensive, microcomputer-based system designed
- to support speech research. CSRE provides a powerful, low-cost
- facility in support of speech research, using mass-produced and
- widely-available hardware. The project is non-profit, and relies
- on the cooperation of researchers at a number of institutions and
- fees generated when the software is distributed. Functions
- include speech capture, editing, and replay; several alternative
- spectral analysis procedures, with color and surface/3D displays;
- parameter extraction/tracking and tools to automate measurement
- and support data logging; alternative pitch-extraction systems;
- parametric speech (KLATT80) and non-speech acoustic synthesis,
- with a variety of supporting productivity tools; and a
- comprehensive experiment generator, to support behavioral testing
- using a variety of common testing protocols.
- A paper about the whole package can be found in:
- Jamieson D.G. et al, "CSRE: A Speech Research Environment",
- Proc. of the Second Intl. Conf. on Spoken Language Processing,
- Edmonton: University of Alberta, pp. 1127-1130.
- Hardware: Can use a range of data aqcuisition/DSP
- Cost: Distributed on a cost recovery basis.
- Availability: For more information on availability
- contact Krystyna Marciniak - email march@uwovax.uwo.ca
- Tel (519) 661-3901 Fax (519) 661-3805.
- For technical information - email ramji@uwovax.uwo.ca
- Note: Also included in Q5.4 on speech synthesis packages.
-
-
- Package: OGI Speech Tools from the Center for Spoken Language
- Understanding (CSLU) at the Oregon Graduate Institute of Science
- and Technology (Portland Oregon)
- Platform: Unix????
- Description: The OGI Speech tools include :-
- 1. An X windows display tool (LYRE) for displaying data in a time
- synchronous fashion for a. the speech signal b. spectrograms
- c. phoneme labels, and other information.
- 2. A Neural Network (NOPT) training package.
- 3. An set of C library routines (LIBNSPEECH) for the manipulation
- of speech data, including: a. PLP Analysis, b. Rasta PLP
- Analysis, c. Linear Predictive Coding, d. Mel Cepstrum Coding,
- e. Fast Fourier Transform
- 4. A set of utilities for converting file formats such as ADC, NIST,
- mu-law, binary files, and ascii. Includes filtering.
- 5. A database utility (find_phone) to automate speech database
- related enquiries. It allows the user to specify a particular
- label or set of labels in a given context, display all occurrences
- of the label, and relabel the occurrences if desired.
- 6. A Vector-Quantizer based on the Linde Buzo and Gray (LBG)
- algorithm.
- 7. A set of PEARL Scripts which have been used mainly to automate
- the use of the OGI Speech Tools.
- 8. MAN Pages for all routines and programs developed, as well as
- a User manual in both in postscript and {\bf tex} format.
- Misc: Software is written in ANSI C.
- Availability: By anonymous ftp from
- speech.cse.ogi.edu:/pub/tools/
- Contact: Try tools@cse.ogi.edu
-
-
- Package: Signalyze 3.0 from InfoSignal
- Platform: Macintosh
- Description: Signalyze's basic conception revolves around up to 100
- signals, displayed synchronously in HyperCard fashion on "cards".
- The program offers a complement of signal editing features,
- quite a few spectral analysis tools, manual scoring tools, pitch
- extraction routines, a good set of signal manipulation tools, and
- extensive input-output capacity.
- Handles multiple file formats: Signalyze, MacSpeech Lab, AudioMedia,
- SoundDesigner II, SoundEdit/MacRecorder, SoundWave, three sound
- resource formats, and ASCII-text.
- Sound I/O: Direct sound input from MacRecorder and similar devices,
- AudioMedia, AudioMedia II and AD IN, some MacADIOS boards and devices,
- Apple sound input (built-in microphone). Sound output via Macintosh
- internal sound, via SoundManager 3.0, some MacADIOS boards and devices
- as well as via the Digidesign 16-bit boards.
- It has a range of capabilities for creating, editing and manipulating
- label files with flexibility in labelling format.
- Compatibility: MacPlus and higher (including II, IIx, IIcx, IIci, IIfx,
- IIvx, IIvi, Portable, all PowerBooks, Centris and Quadras). Takes
- advantage of large and multiple screens and 16/256 color/grayscales.
- System 7.0 compatible. Runs in background with adjustable priority.
- Misc: A demo available upon request.
- Manuals and tutorial included.
- It is available in English, French, and German.
- An UPDATER to version 2.48 is now available in:
- - The UNIL Gopher server (see last page of InfoSignal News 8)
- - The LAIP FTP server. Address: MACFL4082.unil.ch, machine no.
- 130.223.104.31, login: anonymous, password: your email
- Also available are a demo program, and current questions and answers.
- Cost: Individual licence US$350, site license US$500, plus shipping.
- Upgrades from version 2.0 are available.
- Contact: North America - Network Technology Corporation
- 91 Baldwin St., Charlestown MA 02129
- Fax: 617-241-5064 Phone: 617-241-9205
- Elsewhere - InfoSignal Inc.
- C.P. 73, 1015 LAUSANNE, Switzerland,
- FAX: +41 21 691-1372,
- Email: 76357.1213@COMPUSERVE.COM.
-
-
- Package: Kay Elemetrics CSL (Computer Speech Lab) 4300
- Platform: Minimum IBM PC-AT compatible with extended memory (min 2MB)
- with at least VGA graphics. Optimal would be 386 or 486 machine
- with more RAM for handling larger amounts of data.
- Description: Speech analysis package, with optional separate LPC program
- for analysis/synthesis. Uses its own file format for data, but has
- some ability to export data as ascii. The main editing/analysis prog
- (but not the LPC part) has its own macro language, making it easy to
- perform repetitive tasks. Probably not much use without the extra
- LPC program, which also allows manipulation of pitch, formant and
- bandwidth parameters.
- Hardware includes an internal DSP board for the PC (requires ISA
- slot), and an external module containing signal processing chips
- which does A/D and D/A conversion.
- A speaker and microphone are supplied.
- Misc: A programmers kit is available for programming signal processing
- chips (experts only).
- Manuals included.
- Cost: Recently approx 6000 pounds sterling. (Less in USA?)
- Availibility: UK distributors are Wessex Electronics,
- 114-116 North Street, Downend, Bristol, B16 5SE
- Tel: 0272 571404.
- In USA: Kay Elemetrics Corp,
- 12 Maple Avenue, PO Box 2025, Pine Brook, NJ 07058-9798
- Tel:(201) 227-7760
-
-
- Package: MacSpeech Lab II (MSL II)
- Platform: Macintosh
- Description: A sound analysis and acquisition for Macs. MSL II delivers
- the most common functions for speech analysis (FFTs, LPCs, f0
- extraction, etc.) & produces grayscale spectrographic displays.
- Can be used for various speech technology and phonetic training
- tasks. The software an trade off accuracy and speech.
- Hardware: requires MacADIOS ("Macintosh Analog/Digital Input/Output
- System") hardware for speech I/O at 12/16 bits.
- Misc: Software no longer updated by GW Instruments; MSL soft/hardware will
- not perform input/output on Quadras, for example, though analysis
- seems fine. Known to operate properly on systems as high as IIcx &
- II fx.
- Cost: $4990 (in May '92 price list; no MSL soft/hardware package
- listed in January '93).
- Contact: GW Instruments
- 35 Medford Street, Somerville, MA 02143
- Phone: (617) 625-4096 Fax: (617) 625-1322
-
-
- Package: Ptolemy
- Platform: Sun SPARC, DecStation (MIPS), HP (hppa).
- Description: Ptolemy provides a highly flexible foundation for the
- specification, simulation, and rapid prototyping of systems.
- It is an object oriented framework within which diverse models
- of computation can co-exist and interact. Ptolemy can be used
- to model entire systems.
- Ptolemy has been used for a broad range of applications including
- signal processing, telecomunications, parallel processing, wireless
- communications, network design, radio astronomy, real time systems,
- and hardware/software co-design. Ptolemy has also been used as a lab
- for signal processing and communications courses.
- Ptolemy has been developed at UC Berkeley over the past 3 years.
- Further information, including papers and the complete release
- notes, is available from the FTP site.
- Cost: Free
- Availability: The source code, binaries, and documentation are available
- by anonymous ftp from "ptolemy.bekeley.edu" - see the README file -
- ptolemy.berkeley.edu:/pub/README
-
-
- Package: Khoros
- Description: Public domain image processing package with a basic DSP
- library. Not particularly applicable to speech, but not bad
- for the price.
- Cost: FREE
- Availability: By anonymous ftp from pprg.eece.unm.edu
-
-
- Package: SpeechViewer II
- Description: Speech Therapy Tool
- See the detailed description in the handicap section (Q1.6).
-
-
-
- Can anyone provide information on capability and availability of the
- following package?
-
- ILS ("Interactive Laboratory System")
-
- ------------------------------------------------------------------------
-
- Q1.10: Miscelaneous Software and Other Resources.
-
- Resource: CMU dictionary
- Description: Phonemic transcriptions of 100,000 English words.
- (Presumably with American English pronunciation.)
- Availability: By anonymous ftp from
- ftp.cs.cmu.edu:project/fgdata/dict
-
-
- Package: Network Audio System Release 1.1
- Platforms: Various (includes SunOS, Solaris, SGI)
- Description: A device-independent mechanism for transferring, playing
- and recording audio signals over a network. Has a range of
- features suited to networks.
- Cost: Free
- Availability: By anonymous ftp from
- ftp.x.org:/contrib/netaudio/netaudio-1.1.tar.Z
- Also available in the same directory are document files and
- some sample sounds.
-
- Pacakage: NEVOT (1.4v) from AT&T BL
- Platforms: Sun Sparc Station (SunOS 4.1.x) and Silicon Graphics
- Description: Audio-conferencing tool which supports both point-to-point
- and broadcasting of audio using multicast IP.
- Audio encoding:
- + PCM 64kb/s 8-bits u-law encoded 8KHz PCM (G.711)
- + ADPCM 32 kb/s [Sun only] (G.721)
- + DVI ADPCM 32 kb/s
- + ADPCM 24 kb/s [Sun only] (G.723)
- + CELP 4.8 kb/s
- + LPC 2.4 kb/s
- Source is available.
- Availability: by anonymous ftp from
- gaia.cs.umass.edu:pub/nevot
- Contact: Henning Schulzrinne (hgs@researc.att.com)
-
-
-
- Andrew Hunt
- Speech Technology Research Group Ph: 61-2-692 4509
- Dept. of Electrical Engineering Fax: 61-2-692 3847
- University of Sydney, NSW, 2006, Australia email: andrewh@speech.su.oz.au
-